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Learning to detect an oddball target 

Nidhin Koshy Vaidhiyan and Rajesh Sundaresan 


Abstract 

We consider the problem of detecting an odd process among a group of Poisson point processes, all having the same rate 
except the odd process. The actual rates of the odd and non-odd processes are unknown to the decision maker. We consider a 
time-slotted sequential detection scenario where, at the beginning of each slot, the decision maker can choose which process to 
observe during that time slot. We are interested in policies that satisfy a given constraint on the probability of false detection. We 
propose a generalised likelihood ratio based sequential policy which, via suitable thresholding, can be made to satisfy the given 
constraint on the probability of false detection. Further, we show that the proposed policy is asymptotically optimal in terms of 
the conditional expected stopping time among all policies that satisfy the constraint on the probability of false detection. The 
asymptotic is as the probability of false detection is driven to zero. 

We apply our results to a particular visual search experiment studied recently by neuroscientists. Our model suggests a 
neuronal dissimilarity index for the visual search task. The neuronal dissimilarity index, when applied to visual search data from 
the particular experiment, correlates strongly with the behavioural data. However, the new dissimilarity index performs worse than 
some previously proposed neuronal dissimilarity indices. We explain why this may be attributed to the experiment conditons. 


I. Introduction 

Consider K homogeneous Poisson point processes. All processes except one, which we call the “odd” process, have the 
same rate. The actual rates of the odd process and the non-odd processes are unknown. The objective is to detect the odd 
(or anomalous or outlier) process as quickly as possible, but subject to constraints on the probability of false detection. For 
simplicity, we assume that time is divided into slots of fixed duration T. During a particular time slot, the decision maker can 
choose exactly one among the K processes for observation. This choice is made only at slot beginnings. 

We cast the above problem into one of sequential detection with control m, but with unknown underlying distributions. 
The structural constraints in the problem, that exactly one among the K processes has a distribution different from the others, 
opens up an opportunity to learn about the underlying distributions from the observations, and yet, learn just about enough to 
make a reliable decision. 

We adapt the sample complexity result of Kaufmann et al. 0, originally developed for the best arm identification problem, 
to our setting and obtain a lower bound on the conditional expected stopping time for any policy that satisfies the constraint 
on the probability of false detection. The lower bound suggests that the conditional expected stopping time is asymptotically 
proportional to the negative of the logarithm of the probability of false detection. The proportionality constant is obtained as 
the solution to a max-min optimisation problem of relative entropies between the true system state (index of the odd process, 
its rate, and the rate of the non-odd processes) and other alternatives. The optimisation problem for the lower bound also 
suggests the nature of an asymptotically optimal strategy. 

The usual methodology employed in problems with lack of exact knowledge of the underlying distributions is to use tests 
that are based on generalised likelihood ratios (GLR tests or GLRT). We work with a modification of the GLRT Unlike the 
usual GLRT statistic, we replace the maximum likelihood function in the numerator of the statistic by an average likelihood 
function, the average computed with respect to an artificial prior on the odd and non-odd rates. For the Poisson model, we 
employ a gamma distribution on the rates of the odd and non-odd processes as the prior, with the shape and rate parameter set 
to one. In fact, any prior density having full support would suffice. The specific gamma prior allows easier characterisation of 
the averaged likelihood function. The averaging prevents over estimation of the likelihood ratio function, and at the same time 
ensures that, asymptotically, the averaged version is not too far away from the true likelihood function. The modification allows 
us to design a time invariant and simple threshold policy that satisfies the probability of false detection constraint. We show 
that the sampling strategy of the proposed policy (which of the K processes to observe at the beginning of each slot) converges 
to the sampling strategy suggested by the lower bound, where the convergence is as the number of slots observed tends to 
infinity. We show that, asymptotically, the conditional expected stopping time of the proposed policy scales as — log(Pe)/L**, 
where Pe is the constraint on the probability of false detection and Z?*, a relative entropy based constant, is the optimal scaling 
factor as suggested by the lower bound. 

The motivation to study this problem comes from a visual search problem studied by Sripati and Olson Q, where a subject 
has to detect an odd image among a sea of distractor images “as quickly as possible without guessing” 13]. We model the 
visual search task as an oddball detection problem, as above, and propose D* as a neuronal dissimilarity index for such 
visual search tasks. We compare the performance of the proposed dissimilarity index with other dissimilarity indices proposed 
earlier by Vaidhiyan et al. in iH. In that paper, it was assumed that the odd and the non-odd rates were known. Our proposed 
dissimilarity index of this paper correlates strongly with some behavioural data of 0. However, the proposed dissimilarity 
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index performs slightly worse than the neuronal dissimilarity index proposed by Vaidhiyan et al. in 11. Nevertheless, we 
present the comparisons on the existing experimental data. 

A. Prior Work 

Sequential hypothesis testing with control, assuming knowledge of the underlying distributions of the observations under 
different hypotheses, was hrst studied by Chernoff H. Such problems are also known as Active Sequential Hypothesis Testing 
Problems (ASHT) E], 0. Chernoff ffl studied ASHT in the context of designing optimal experiments. His performance 
criterion was the total cost of sampling, which is proportional to delay, plus a penalty for false detection. Chernoff proposed 
a policy, the so-called Procedure A, and showed its asymptotic optimality as the cost of sampling went to zero. Procedure 
A maintains a posterior distribution on the set of hypotheses and, at each instant, selects actions according to the hypothesis 
with the highest posterior probability. In a series of works, Naghshvar and Javidi 0, 0, Q, 0, 0 studied ASHT from 
a Bayesian cost minimisation perspective. Nitinawarat et al. Go), im studied ASHT from the perspective of minimising the 
conditional expected cost (generally stopping delay), subject to constraints on the probability of false detection. All the above 
works assumed knowledge of the underlying distributions under different hypotheses. 

Li et al. ca studied fixed sample size outlier detection under unknown typical and outlier distributions, but in a finite 
observation space setting. They assumed simultaneous observability of all processes at each observation instance. They proposed 
a modified GLRT which was shown to have, asymptotically, the same error exponent as that of an optimal algorithm with 
knowledge of the underlying distributions. The asymptotics was as the number of processes available for observation tended 
to inhnity. They termed such algorithms asymptotically exponentially consistent. Further, they extended their study to the 
setting where there are more than one outlier processes. They extended their algorithm and showed that it is asymptotically 
exponentially consistent in the new setting. Li et al. 03 studied sequential versions of na and showed that another modihed 
GLRT that keeps sampling until the test statistic crosses a threshold is universally consistent as the threshold is increased to 
infinity. In both these works, unlike in the ASHT setting and unlike our setting, at each observation instance, observations from 
all the processes were available to the decision maker. Nitinawarat and Veeravalli lfT4l studied an outlier detection problem in 
a setting similar to the one being considered in this paper, where at each observation instance, the decision maker is allowed 
to observe only one of the processes. But different from our setting, they assume knowledge of the typical (or non-odd) 
distribution. They proposed an algorithm that was shown to have vanishing probability of false detection as the threshold is 
increased to inhnity. Further, the proposed algorithm was shown to have, asymptotically, the same error exponent as that of an 
optimal policy with knowledge of the atypical (odd) distribution. Recently, Cohen and Zhao ifTSll studied a problem similar to 
ours, but restricted their study to the setting when the atypical (odd) and typical (non-odd) distributions belonged to disjoint 
parameter sets. Consequently, in their setting, the optimal action at each decision instance is to observe the process that has 
the generalised maximum likelihood with respect to the set of atypical (odd) parameters. Their proposed policy also had a 
threshold based stopping criterion. They showed that their policy has the same asymptotic scaling for the conditional expected 
stopping time as for an optimal policy with knowledge of the distributions. 

A related problem, studied extensively by the machine learning community, is the problem of identihcation of the best 
arm for multi-armed bandits. Kaufmann et. al 0 studied the sample complexity of the best arm identihcation problem. Our 
problem of anomaly detection can be cast as an odd-arm identihcation problem. The structures in the problems that need to 
be exploited are different. 

B. Our Contribution 

Our asymptotically optimal algorithm differs from those in prior works in the following aspects: 

• Unlike the works on ASHT 0, 0, 0, 0, 0, 0, ifTOl . ifTTl . we assume no knowledge of the underlying distribution 
under different hypotheses. However, our proposed algorithm is an adaptation of Chernoff’s Procedure A to our setting. 

• For a given probability of false detection constraint, we propose a policy with a new modihcation of GLRT and a hxed 
threshold such that it satishes the constraint. 

• Unlike the works of Li et al. II2, CD, our observations are limited by the chosen actions. There is then a clear exploration 
versus exploitation tradeoff 

• Unlike the work of Nitinawarat and Veeravalli m, we do not assume knowledge of the atypical (odd) distribution, nor 
do we assume the typical (non-odd) distribution. 

• Unlike the work of Cohen and Zhao iia, we do not assume that the atypical and typical distributions belong to disjoint 
sets. 

• We specihcally consider the setting of Poisson point processes mainly because of our desire to explain the experimental 
observations of Sripati and Olson 0 on neuronal data which are modelled as Poisson point processes in 0. Nevertheless, 
we believe that the same ideas may carry forward to other class of distributions, especially exponential families. 
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C. Organisation 

In Section [I^ we develop the required notation and describe the model. In Section III we provide a lower bound on the 
conditional expected stopping time for any policy that satishes the probability of false detection constraint. The nature of the 
lower bound suggests a candidate asymptotically optimal policy. In the same section, we make some observations on some 
structural properties of the suggested policy. In Section [rV| we formally propose the policy and show that it is asymptotically 
optimal. In Section |Vj we apply the theory to visual search. We show that the proposed neuronal dissimilarity index is strongly 
correlated with the behavioural data. In Section VI we make some concluding remarks and discuss possible extensions. Most 
proofs are relegated to appendices and 


II. Model 

In this section we develop the required notation and describe the model. 

Let K > 3 denote the number of Poisson point processes under consideration. Conditioned on the rates, the processes are 
assumed to be independent of each other. Let H,1 < H < K, denote the index of the odd process. Let i?i > 0 denote the 
unknown rate of the odd process, and let R 2 > 0 denote the unknown rate of the non-odd processes. We assume i?i 7 ^ i? 2 - Let 
the triplet T* = (7T, denote the conhguration of the processes, where the first component represents the index of the 

odd process, while the second and third components represent the odd and non-odd rates respectively. Let T denote the time 
duration of a time slot. Without loss of generality we can assume T = 1, the analysis holds for general T with an appropriate 
scaling of the rates. The analysis can be done in continuous time as well, but we shall take the simpler slotted time approach. 

Given the Poisson process assumption, a sufficient statistic for the observed process during a time slot is the number of 
jumps observed in that time slot. Let An S {1, 2,..., K} denote the index of the process chosen for observation in time slot n, 
and let Xn G denotes the number of jumps observed in the process during time slot n. Let {Xn)n>i and (A„)„>i denote 
the observation process and the control process respectively. We write X" for {Xi,X 2 , ■ ■ ■, Xn) and A" for (Ai,A 2 ,..., An). 
We also write V{K) for the set of probability distributions on {1, 2,..., K}. 

A policy TT is a sequence of action plans that at time n looks at the history X^~^, A^~^ and prescribes a composite action 
that is either {stop, 6) or {continue, X) as explained next. If the composite action is {stop, 5), then the detector stops taking 
further samples (or retires) and indicates 5 as its decision on the hypotheses; 5 G {1, 2,..., Kj. If the composite action is 
{continue. A), the detector picks the next action An according to the distribution A S V{K). The stopping time is defined as 

T := inf{n > 1 : = {stop, •)}. 


Consider a policy it. Conditioned on action An, the true hypothesis H, and the odd and non-odd rates Ri and i? 2 , we 
assume that the observation Xn is independent of previous actions A^~^, previous observations X^~^, and the policy. The 
conditional distribution of Xn, given the current action An, the conhguration T* = {H,Ri,R 2 ), the history and 

the Poisson assumption, is given by 


P{Xn 


l\'it = {H,RuR2),An,X^-^ 


,A”-1)=P(X„ = 

_ J V. 

K V. 


l\'it={H,Ri,R2),An) 
if An = H 
if An + H, 


( 1 ) 

( 2 ) 


where I G Z_|_. 

Let denote the conditional expectation and let R^ denote the conditional probability measure, given ht, under the policy 
TT. Given an error tolerance vector a = {ai,a 2 , ■ ■ ■, ctn), with 0 < < 1, let n(a) be the set of desirable policies dehned as 


n(a) := {tt : P'^{S 7^ z|T' = {H, Ri, R 2 ), H = i) < Ui, for all i and for all T' such that i?i 7^ i?2}- ( 3 ) 


Let IIaII denote maxia^. 

For ease of notation, we drop the superscript tt while writing E'^, , and other variables, but their dependence on the 

underlying policy should be kept in mind, and the policy under consideration will be clear from the context. 


III. The converse - Lower bound 

In this section we develop a lower bound on the conditional expected stopping time for any policy that belongs to n(a). 
We show that, as ||a|| —0, the lower bound scales as —log(||a||)/iA*. We also characterise D* in detail in this section. 

The following proposition gives a lower bound on the conditional expectation of the stopping time for all policies belonging 
to 11 ( 0 ;). The proof may be seen as an application of the data processing inequality ( iflhl p. 16], ifTTlI ') for relative entropy. 


Proposition 1: Fix a, with 0 < cti < 1 for each i. Let T' = {i,Ri,R 2 ) be the true configuration. For any tt G n(a), we 
have 


[t|T'] > 


4(||«||,1-||«||) 

D*{i,Ri,R2) 


( 4 ) 
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where (i{,(||Q;||, 1 — ||a||) is the binary relative entropy function defined as 

dbix, I - x) := a;log(x/(l - x)) + (1 - a;) log((l - x)/x), 

and D*(i, Ri, R 2 ) is defined as 


D*{i,Ri,R2) 


max min 

\eV(K) R[>0,R’^>0,j^i 


[Xi^)D{RM) + X{j)D{R2\\R[) + (1 - A(*) - Xij))D{R2\\R'2 )], 


(5) 


where D{x\\y) := x\og{x/y) — x + y is the KL-divergence or relative entropy between two Poisson random variables with 
means x and y. 


Let X*{i, Ri, R 2 ) denote the A G V{K) that maximises Q, i.e., 

A*(i,i?i,i? 2 ) = arg max min [X{i)D{Ri\\R'^) + X{j)D{R 2 \\R[) + {1 - X{i) - X{j))D(R 2 \\R' 2 )] ■ ( 6 ) 

\eV{K) R[,R'^,j^i 

We can interpret D* (i, i?i, i? 2 ) as the minimum among relative entropy rates between the true configuration = {i,Ri,R 2 ) 
and all other possible alternate configurations 'L' = (j, R[, i?^), with j ^ i, but maximised over all policies (action strategies) 
that pick actions in an independent and identically distributed (i.i.d.) manner. It can also be interpreted as the max-min-drift 
of the log likelihood ratio process between the true configuration and other error configurations, the minimum being over all 
possible error configurations, and the maximum being over all i.i.d. policies. D*{i, Ri, R 2 ) is the key information quantity in 
this paper. Since (ib(||a||, 1 —||a||)/log(||a||) —)• 1 as ||a|| —>■ 0, Propositionshows that the conditional expected stopping time 
of the optimal policy scales at least as — log(||a||)/I?*(i, i?i, i? 2 ) as the probability of false detection constraint |ja|| —> 0 . 
In Section |IV] we will describe a policy that is upper bounded by, and therefore achieves, a similar scaling, though only 
asymptotically as |ja|| —> 0. 

Proof of Propositional] Assume E'^ [tI'L] is finite, for otherwise 0 . is trivially true. We apply the sample complexity 
result of Kaufmann et al. [H Lemma 1] to our setting. Let A'j(r) = X]fc=i ^{Ak=j} denote the number of samples from 
process j observed till the stopping time r. Clearly, r = J2f=i Kaufmann et al. ||2l Lemma 1] showed that, for any 

TT G 11(a), conditioned on the true configuration = (i,Ri,R 2 ), and for any alternate configuration 'P' = (j, R[, R 2 ), j f=- i, 
the conditional expected sample sizes satisfy 


E-[N,(T)mD(R,\\R' 2 )+E-[N,(T)\^]DiR 2 \\R[)+l ^ Ef [Nkir)^ ] DiR 2 \\R' 2 ) > dbi\\a\\,l - \\a\\). (7) 

Multiplying and then dividing the left-hand side by E^ we get 

db(||a||,l - ||a||) 

D{R2\\R2) ■ 
( 8 ) 

Since ([^ holds for any and j 7 ^ i, and since E'^ ['r|'I'] does not depend on and j i, we can choose the 

tightest bound and get 




E^ 


E^ [rl^-] 


DiRiWK) 


E^ [jV,-(r)|^] 
E^ [rl^-] 


DiR2\\R[) 


1 - 


E^ [N,(T)\'i/] + E^ [iVj(^)|^] 
E^ [r|«'] 


4(||a||,l-||a||) 

< E^ min 


E^ [iV,(T)|^] 
E^ 




+ 1 - 


E^ 

E^ [iV,(T)|^] + L;" [jVj-(T)|^] 
E^ 


D(R2\\R'2) 


<E^[Tm max min [A(i)Li(i?i||i?' 2 ) + A(j)L>(i? 2 ||i?'i) + (1 - A(i) - A(j))D(i? 2 ||i?^)]. 
AG'P(iV) R[,R2,j^i 


(9) 

( 10 ) 


The last inequality follows because maximisation over all A G V(K) only increases the right-hand side. This completes the 
proof. ■ 


We now describe some simplifications for D*{i, Ri, R 2 ) and X*{i, Ri, R 2 ). We show that the iT-dimensional optimisation 
in 0 can be reduced to a one-dimensional optimisation. 
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Proposition 2 : Consider K Poisson point processes with configuration rp = The quantity D*{i,Ri,R2) of (j^ 

can be equivalently expressed as 


D*(i,Ri,R2)= max 

0<A(t)<l 


X{i)D{R,\\R) + (1 - A(z)) i^ ^I d{R2 \\R) 


(K-iy 


R= A(z)i?i + (1 - A(z)) 


where 


Also, A*(z, i?i, i?2) is of the form 

A*(z,i?i,i? 2 )(j) = 


iK-2) 

{K-iy 


i ?2 / A(z) + (1 - A(z)) 


iK- 2 ) 

{K- 1 ) 


A*(z,i?i,i?2)(0 tfj = i 

(l-A*(z,i?i,i?2)«)/(iT-l) 


( 11 ) 


( 12 ) 


( 13 ) 


Proof: Consider (j^. Observe that R'^ appears only in the middle term on the right-hand side. This is minimised when 
R'l = i?2 and the minimum value is zero. We therefore have 


D*{i,Ri,R2)= max min [A(i)D(i?i||i?^)-f (1 - A(i) - A(j))D(i?2||i?2)] 

\GViK) Rfj^l 

xy)D{Ry\R'y + (i - x{{)) [^~‘f. DiR2\\R'2) 


= max mm 

0<A(z)<l R'^ 


( 14 ) 

Equation ( [T 5 | ) follows from the fact that the A that maximises ( [T 4 | will have equal mass on all locations other than i, i.e., the 
maximiser A* will satisfy X*{j) = (1 — X*{i))/{K — 1 ), for all j y i. 

For a fixed A(z), to find the i?2 that minimises the term within brackets in ( 15 i which is a strictly convex function of Ry 
we take its derivative with respect to R'2 and equate it to zero. We then see that the minimising i?2 satisfies the equation 

A(z)f 2 '(i?i||i?') + (1 - A(z))|J^f?'(i? 2 ||i?^) = 0 , ( 16 ) 

where D'[x\\y) is the derivative of D(x\\y) with respect to the second argument y, which turns out to be 1 — x/y. The R^ 
thus obtained is 


R'2 = (A(*)i?i + (1 - / (a(z) + (1 - A(z)) 


iK- 2 ) 

iK-l) 


( 17 ) 


This completes the proof. 


As we will see, X*{i, Ri, R2) can be interpreted as the distribution on the set of actions of the optimal i.i.d. policy that 
achieves D*{i, Ri, R2). Heuristically, a good policy would attempt to have an action process whose empirical measure on 
the set of actions approaches the distribution X*{i, Ri, R2), as ||a|| —>■ 0. A closed form expression for X*{i, Ri, R2) is not 
available. But we now describe some structural properties of A*(z, i?i, i?2)- In particular, we show for any configuration T', 
all components of A*(T') are strictly bounded away from zero. 

Proposition 3 : Fix K > 3 . Let A* be as in (j^. There exists a constant ck S ( 0 , 1 ), independent of {k, 0 i, 6*2) but dependent 
on K, such that 

X*{k,9i,02)ij) > CK > 0 

for all j G {1, 2,..., iT} and for all {k, 6*1, 6*2) such that 0i > 0, 6*2 > 0 and 9 i y 62- 

Proof: See Appendix [A| ■ 

Proposition suggests that a good policy should sample each process at least ck fraction of the time. Estimates of the 
rate of each process should then converge to the corresponding true rate. We will make use of this fact in the analysis of our 
proposed algorithm, which is to come shortly. 

An explicit expansion of the objective function in (111 will show that A*(A:, 6*1,02)(fc) can be equivalently expressed as a 
function of the ratio v = 0 i/( 0 i +02). Figureshows the value of X*{k,9i,92)ik) for different values of v and for different 
K, K varying from 3 to 1000 and 00. We observe the following: 

1 ) X*{k,9i,92)ik) is lower bounded by ^ 0.3 for all 1/ and for all K, and X*{k,9i,92)ik) attains its minimum at 12 = 1 
and for K = 3 . 

2 ) X*(k,9i,92)ik) is upper bounded by ~ 0.7 for all r and for all K, and the maximum is approached at zz = 0 and as 
K —t 00. 

3 ) At zz = 1 / 2 , we have Ri — R2', the objective function in (111 is identically zero, and any X{k) works. We may take 
X*{k) to be the continuous extension of X*{k) as zz — 1 / 2 . 

From the above observations, for a fixed K, we have A*(/c, 0 i, 02)(j) ^ ( 0 . 3 /iT) for all j and for all (fc,0i,02)- In Appendix 
[a| where we prove Proposition]^ we obtain a looser bound for A*(fc, 0 i, 02)(j). We only show that A*(fc, 9 i, 02)(/) > 0 . 1 /iT. 
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Fig. 1. A*(fc, 01 , 62 )(fc) versus y = 6l/(0l + 62 ) for various/f. 


IV. Achievability - Modihed GLRT 

In this section we describe our proposed asymptotically optimal policy that achieves the lower bound in Proposition as 
the constraint on the probability of false detection is driven to zero. Our algorithm is an adaptation of Chernoff’s Procedure A. 
The likelihood ratio function in Procedure A is replaced by a modified generalised likelihood ratio function in our algorithm. 
The strategy at each time slot is not only a function of the hypothesis with the largest GLR statistic, but also a function of 
the maximum likelihood estimates of the odd and non-odd rates. 

Before describing the algorithm, we develop some required notation. 

Let denote the number of times process i was chosen for observation up to time n, i.e., AJ* = so 

n = Let Yp denote the number of observed jumps in process j up to time n; Yp = Let V" 

denote the total number of observed jumps up to time n; Y^ = L,"- 

Let = (j, 01 , 6 * 2 )) be the likelihood function of the observations and actions up to time n, conditioned on the 

configuration 'k, i.e.. 




nLi(^*!) 


0 , 




Let /3ii,/ 3 i 2 ,/ 321)/322 be fixed constants, all greater than zero. Let 


(18) 


//3ii,/3i 2./321,/322 (^ = (L^I. 6*2)1-66 = j) := //3 ii,/3i2(0i|-66 = j) //32i,/322 ( 6*2 | 6 / = j) 

5 /^ 11 -ig-/3i2ei /5^|i0^2i-ig-/322e2 

r(^n) r{/ 32 i) 

denote the product gamma densities on the parameters 0i and 02 . The Gamma distribution is a conjugate prior for the Poisson 
distribution. We will use = (j, 0 i, 02)|66 = j) as an artificial prior on the parameter space 0 = {( 0 i, 02 )} in 

our proposed algorithm. While any positive (/3ii,, 812 ,/ 32 i,/ 322 ) would suffice, (/ 3 ii,/ 3 i 2 ,/ 32 i,, 822 ) = (1,1,1,1) makes the 
calculations and the presentation simpler. 0 i and 02 then have the exponential distribution with mean 1 . 

Let 0” = (0"i, 0 ^ 2 ) denote the maximum likelihood estimates of the odd and non-odd rates at time n conditioned on H = j, 
i.e., 

Cl = and 0"2 =(21) 
J.i ]yn 1.2 [n-Nf) ^ ’ 

Let 


(19) 

( 20 ) 


/(A",24"|i/=j) 


:= max /(A”,A"|T') 

= /(A",A"|'k = (j,0”iCiC)) 

1 -Y ." 


yn _ yp 

n-m 


(Y^-vn 


o-(Y--Yn 


( 22 ) 

(23) 

(24) 
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denote the maximum likelihood of the observations and actions till time n conditioned on H = j. The maximum is taken over 
all possible odd and non-odd rates. Let the averaged likelihood function at time n, averaged according to the artihcial prior 
/i. 1 , 1,1 over all conhgurations ik given H = i he 


f{X'^,A^\H = i) := I = {i, 01 , 62 )) fi,i,iA{i, 0 i, 02 )\H = i)d9id02 


1 


e"®! e-^^d0id92 


rr" 7 V 11 / '^1 ■ "2 ■ c c u,uiu,a 2 

1 r(F," +1) r(y’^ - r," -p i) 


(25) 

(26) 
(27) 


]Tt=i{Xt\) (iV” + {n-N^ + l)(y"-L"+i) ’ 

where the last equality follows by recognising the presence of Gamma(l^" +1, TV" + 1) and Gamma(F" — -f 1, n — TV" +1) 
densities without scale factors in ( |27l l. The modihed GLR is dehned as 

'/( 2 f",A"|iV = 7)\ 


Zij(n) log 


= log 


f{X'-,A^\H = j)) 

r(r," -f 1) r(y" -1;" +1) 


(28) 


(TVf + l)(L"+i) {n-N^ + !)(>'"-L"+i) 

(log 


(F" - F/) 

(n - m) 


- 1 


(29) 


Note that the numerator is an averaged likelihood under H = i, averaged with respect to an artihcial prior, and denominator 
is a maximum likelihood under H — j. Let 


Zi{n)'.= min Zij{n) (30) 

denote the modihed GLR with respect to i for the nearest alternate. 


We now describe our proposed policy. 


Policy: Modified GLRT {ttm{L)) 

Fix L > 1. 

At time n (end of slot n): 

• Let T*(n) = argmax^ Zi(n), the index with the largest modihed GLR after n time slots. Ties are resolved 
uniformly at random. 

. If Fi.(„)(n) < log ((AT - 1)L) then A„+i is chosen according to A*(F(n), I-®-’ 

Pr(A„+i = j|X",A") = (31) 

* If > log ((AT — 1)A) then the test retires and declares i*{n) as the true hypothesis. 


As done in a, we also consider two variants of ttm{L) which are useful in the analysis. 

• Policy Tr\^{L): This is the same as TrM{L), but stops only at decision i when Zi{n) > log((Ar — 1)A). 

• Policy ttm- This is the same as ttm{L), but never stops, and hence L is irrelevant. 

Under a hxed hypothesis H = i, and the triplet of policies ['Km{L),t:'m{L),t:m{L)), it is easily seen that there is a common 
underlying probability measure with respect to which the processes (2f„, A„)„>i associated with the three policies are naturally 
coupled, with only the stopping times being different. We denote the stopping times by r(7rM(A)) and T{Tr\j{L)), respectively. 
Under this coupling, the following are true; 

{r(7rM(A)) >n}C {t{ttA{L)) > n} 

C{F,(n)<log((TV-l)L)}. 


We now explore the characteristics of the proposed policy TrM{L)- 

Proposition 4: Fix A > 1. Policy TrM{L) stops in hnite time with probability 1, that is, P(t{ttm{L)) < 00 ) = 1. 

Proof: See Appendix B-A| ■ 

In the proof, we argue that, when the odd process has index i, i.e., H = i, the test statistic Zi{n) has a strictly positive drift 
and hence will cross the threshold log((Ar — 1)A) in hnite time almost surely. Proof is given in Appendix B-A| 

For any a, we show that the policy with L chosen suitably, belongs to 11(0;). In other words, 7 rM{L) satishes the 

constraint on the probability of false detection. 















Proposition 5: Fix a = (ai, a 2 ,. ■ ■, ax)- Let L = 1/ minj; ak- We then have ttm{L) G n(a). 

Proof: From the choice of L, we have 1/L < ak for all k G {1, 2, ..., K}. This implies n((l/L, 1/L,..., 1/L)) C n(a). 
Hence, it suffices to show that TrM{L) G n((l/L, 1/L, ..., 1/L)). 

Fix T' = Let A” = {uj : t{ttm{L)){uj) = n,6{uj) = j} denote the sample paths for which the decision maker 

stops sampling after n time slots and decides in favour of H = j. The decision region in favour of j is denoted Aj := U„>iA”. 
Note that 


A" n A™ = 0 for all m 


(32) 


We now use a standard change of measure argument to bound the conditional probability of false detection as follows, with 
P in place of 

P{5 ^ = (*, Pi, P 2 )) = -^'1^ = RgR 2 )) + P(r(7rM(L)) = oolT- = (^, Pi, P 2 )) 






(33) 


= EE 

j^in>l' 


ujGA^ 




n>l 


/(x",a"|P = 0 


/( 


/(x",a"|P = j)d(a:",a”) 


^ ' n>l 

1 

< — . 

- L 


X'-, a"|P = j) 

[ /(x^a"|P = J)d(x^a’^) 


(34) 

(35) 

(36) 


The equality in ( |3^ follows from ( [32] l and from Proposition Q. The inequality in ( (34] l follows because the maximum likelihood 
function satisfies /(a;",a”|P = i) > /(j;",a"|T' = {i, Ri, R 2 )) for all 4^ such that H = i. The inequality in (35 1 follows 
because w G A" implies Zji > log((K — 1)L), which in turn implies that the term within parenthesis is upper bounded by 
1/((K— 1)L), a consequence of ([2^. Inequality in (36l follows because the inner summation in (35l is a sum of probabilities 


of disjoint events, and hence is upper bounded by one. ■ 

Observe that we chose the modified GLR instead of GLR precisely because we want to recognise the inner summation in 
as a probability of an event and upper bounded by 1. If we use the GLR, the integrand would have been a maximum 
likelihood which after summation and integration may not even be finite. 

We now move on to show that ttm is asymptotically optimal. We first assert that the process (Zi(n))n>i has an asymptotic 
drift equal to P* (i. Pi, P 2 ). 


Proposition 6 : Consider the non-stopping policy ttm- Let 'k = (z,Pi,P 2 ) be the true configuration. Then, 

Zi{n) 


lim 

n—^oo Tl 


= D*(i,Ri,R 2 ) almost surely. 


(37) 


Proof: See Appendix B-B 


We now state the main proposition that upper bounds the expected stopping time of our proposed policy ttm- 
Proposition 7: Consider the policy Let T* = (z, Pi,P 2 ) be the true configuration. Then 

1 


L^oo log(L) P*(z,Pi,P2) 
L^oo log(L) D*{i,Ri,R 2 ) 


almost surely, and further. 


(38) 

(39) 


We now state the main theorem that combines the lower bound in Proposition and the upper bound in Proposition to 
show that our proposed policy 7 Tm{L) is asymptotically optimal. 

Theorem 8 : Consider K homogeneous Poisson point processes with configuration T' = {i, Ri, R 2 ). Let be a 

sequence of vectors, where is the nth tolerance vector, such that lim„_,,oo ||a^"^|| = 0 and 


lim sup ■ 


\a 


(")| 


(n) 

n-)-oo miUk 


< B for some B. 


(40) 














9 


Then, for each n, the policy TTMiLn) with = 1/minfc belongs to n(Q!*-"^). Furthermore, 

liminf inf StMlS = limsup-^ 


n-f-oo ^gn(a(")) log(L„) 


log(T„) 


D*{i, i?i, i?2) 

Proof: The fact that TTM{Ln) G !!(«(”)) follows from Proposition]^ We then have the following inequalities; 

1 , „ £;[T(7r))|T'] 


(41) 

(42) 

(43) 

(44) 

(45) 

D*{fRi,R2y 

Inequality ( |4^ follows from Proposition [T] Equality ( |44| ) follows from the choice of L„ and from assumption (401. Inequality 
(45 I follows because 7rjvf(Tn) is an element in Inequality (461 follows from Proposition]^ ■ 


D*{i,Ri,R2) n-f-oo ,rGn(a(")) — log(||a(")| 

= liminf inf 

n-i-oo ,rgn(a(")) log(L„) 

E [T(7rM(Tn)))|4'] 


< limsup 


log(Tn) 


< 


1 


V. Application TO Visual Search 

In this section we apply our results to the visual search experiments of Sripati and Olson IB). A decision theoretic viewpoint 
of these experiments was proposed by Vaidhiyan et al. a, and a suitable neuronal dissimilarity index based on an ASHT 
model for visual search was identified. The neuronal dissimilarity index was taken as the inverse of the constant to which 
i?[r(L)/log(L)] converges as L —>■ c», where 1/L is the constraint on the probability of false detection and t{L) is the 
stopping time of the optimal policy. We refer the reader to Vaidhiyan et al. Q for a more detailed exposition on the decision 
theoretic formulation. In that paper, it was assumed that i?i and R 2 are known. If they are unknown and have to be learnt along 
the way, we fall into the framework of this paper, and the corresponding neuronal dissimilarity index would be i?i, i? 2 )- 

In the visual search model of Vaidhiyan et al. ID, an image was assumed to elicit, in a population of neurons, a spiking 
pattern according to a multi-dimensional Poisson point process. Also, given the firing rates, the processes were assumed to be 
independent of each other. The neuronal representation for an image was then taken as the corresponding firing rate vector. 
Our current model accounts for a one-dimensional Poisson point process, equivalent to a single neuron scenario. However, all 
our results for a one-dimensional Poisson point process extends naturally to multi-dimensional Poisson point processes. Hence, 
the extension of D* {i, Ri, R 2 ) to D* {i, Ri, R 2 ) for vectors of rates is straightforward - formula <in} continues to hold 

with Ri, i? 2 , R replaced with vectors Ri, R 2 , R respectively, and D{Ri,R) replaced by D(R^,R) = ^2^ D{R^{d)\\R{d)), 
where the summation is over neuron indices. 

Table ]I] shows the correlation values for different dissimilarity indices. See Vaidhiyan et al. ||4| for details on the different 
neuronal indices and different test statistics. We see that the inverse of the proposed D*, as with the inverse of other indices, 
is strongly correlated with the average decision delay. 


TABLE I 

Correlation with Different Neuronal Dissimilarity Indices 


Information Measure 

Congelation (s vs (Neuronal index) 

p-value 

D a 

0.89 

4.3 X 10“°® 

KL 

0.90 

3.1 X 10“°^ 

Chemoff 

0.88 

2.1 X 10“°® 

E 

0.88 

1.1 X 10“°® 

D* (this paper) 

0.89 

8.7 X 10“°^ 


An ideal neuronal dissimilarity index, say difF(i, j), would satisfy j)]diff(F, j) = constant, for each image pair 

Vaidhiyan et al. B) proposed tests of equality of means to measure the dispersion of E[T{i,j)]d\ff{i,j) about a common 
mean. A natural statistic to test the dispersion of group means about a common mean is the ratio of arithmetic mean (AM) 
to geometric mean (GM) of the group means. It turns out that (AM/GM) is the statistic for a GLRT based equality of means 
test for Gamma distributed random variables under a fixed shape parameter assumption. The test for equality of means across 
groups for Gaussian random variables is the one-way ANOVA test. ANOVA is also widely used for non-Gaussian random 
variables also because of its robustness. 
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Table [I^ shows the statistics related to ANOVA and (AM/GM) tests. As with other indices, D* fails the equality of means 
tests (Indicated by the p-values for ANOVA in the second column. Similarly for log(AM/GM) tests). When the statistics are 
used to rank order the indices, from the ANOVA statistic (smaller the better), we see that D* fares slightly worse than D, but 
better than the other indices. From the log(AM/GM) statistics we see that D* performs worse than D and the KL indices, but 
better than Chernoff and Li. 


TABLE II 

Equality of means test using various test statistics 


diff 

ANOVA statistic 

ANOVA p-values 

log(AM/GM) 

D (3 

06.30 

9.35 X 10-1^ 

0.0200 

KL 

06.68 

2.88 X 10-2° 

0.0211 

Chernoff 

06.74 

1.61 X 10-2° 

0.0252 

Li 

24.00 

3.42 X 10-®'^ 

0.0678 

D* (this paper) 

06.34 

6.93 X 10-1° 

0.0233 


The slight degradation in performance of D* with respect to D may be attributed to the particular experimental setup of 
Sripati and Olson 0 . The search tasks associated with a given image pair belonged to the same block of trials, and hence 
were contiguous. This may have possibly cued the subject about the upcoming image pair, which violates our assumption on 
the lack of prior knowledge of the image pairs to the decision maker. More experiments with a wide variety of image pairs 
and few repetitions are required for a more thorough evaluation of the performance of D*. 

VI. Conclusion 

We studied the problem of detecting an odd Poisson point process having a rate different from the common rate of others. 
We developed a lower bound on the conditional expected stopping time for any policy that satisfies the given constraint on 
the probability of false detection. We proposed a modified GLRT based algorithm, that we called ttm and showed that it 
satishes the given constraint on the probability of false detection, and that it is asymptotically optimal with respect to the 
conditional expected stopping time. The proposed algorithm employs a simple threshold criterion for stopping. Interestingly, 
we also showed that, independent of the conhguration, the sampling probability for each process is strictly above a positive 
constant. 

We applied our results to the visual search experiments of Sripati and Olson 0. We proposed D* as a candidate neuronal 
dissimilarity index. D* correlated strongly with the behavioral data. The performance of D* was marginally inferior to the 
neuronal dissimilarity index proposed by Vaidhiyan et al. in 0. 

This work was restricted to Poisson processes. Extension to other class of distributions, especially exponential family is 
under consideration. Extension to general class of distributions will be an interesting extension. 
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Appendix A 

Proof of Proposition[3] 


Let us rewrite 


X*(k,9i,d2)(k) = arg max 

0<A<1 


XD{ei\\e) + (1 - X) [Z ?! ^(^ 2 ||g) 


{K-iy 


where 9, as in (12 1 , is given by 


9 = 


X9i + (1 - A) 


A + (1 - 


(47) 


We have abused notation and have used A to denote the scalar X{k) of ( 111 . We first show that the second derivative of the 
objective function in the above optimisation is negative for all A to establish concavity. Define the objective function as 


/(A) := XD{9y\9) + (1 - A)|J-^D(02||0), 
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where 9, a function of A, is as in (471. We then have 


I = + (^^'('’' 11 ^) + (' - fx 

= D{e,\\e)-9^^^D{e2\\e), 


(48) 

(49) 


(K-l) 

where, we recall, D'{x\\y) is the derivative of D{x\\y) with respect to the second argument y, which turns out to be 1 — x/y. 
Equality (49i follows from ( [T6] l, which ensures that the term within the parenthesis is identically zero. Differentiating once 
again. 




= 1 - 




iK-2) / 02 

{K-l)\ 9 


•^ + (1 - 


dO 

dA 

(if- 2 ) 


1 -^ 1 - 

e) [K-i) 


< 0 , 


where we have used the fact that 

dO 

dX 


A + (i-A)f^VV^ eJ (K-i) 


1 - 


01 


(if-2) 


1 - ^ 


1 - 


Since /(A) is concave in A, and since /(O) = /(I) = 0, and f'{0) > 0 and /'(I) < 0, the maximiser A* satisfies 

^(0iH0)-^(02110) = 0. 


(if-i) 


We do not know of a closed form expression for A* from 
Let A denote a parametrisation of A of the form 


(50) 

(51) 

(52) 

(53) 

(54) 

(55) 


A := 


A + (1 - 


(56) 


so that 9 = X9i + (1 — A)02. We recognise that A is increasing in A. Let A* denote the re-parametrisation for A* according to 
(56 1 . Hence, to show that A* is bounded away from 0 and 1, it suffices to show that A* is bounded away from 0 and 1. Let 


us first consider the case when 9i <02- The case when 0i > 02 has similar arguments. Let us consider a new parametrisation 
of (1^. Let V denote 


so that 


i; — 1 = 


02 - 01 ’ 
01 


(57) 


(58) 


and 


0 _ A01 + (1 - A)02 

02 “01 02 “ 01 
= V — X. 


The left-hand side of (55 i can now be written in terms of v and A as 


Div-l\\v-X)-^^^^^D{v\\v-X). 


(59) 

(60) 

(61) 


Let Xr{v) denote the solution to 


D{v — l||i; — A) — rD{v\\v — A) = 0. 


(62) 
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Fig. 2. Geometric inteipretation of A*. 


Figure 1^ gives a geometric interpretation of A*. Note that X*{v) = Xr{v) for r = {K — 2)j{K — 1). For each u > 1, we 
also have that Xr{v) decreases with r. Furthermore, 0.5 < {K — 2)j{K — 1) < 2. We then have A 2 (uj < X*{v) < Ao, 5 (u). 
Hence, to show that X*{v) is bounded away from 0 and 1 for all v, it suffices to show that sup^>i Ao. 5 (u) < 1, and that 
inf„>i X 2 {v) > 0 . 

We now obtain a Taylor’s series based alternate expression for D{v — a||u — b) when u > 1 and 0 < a, 6 < 1. The alternate 
expression replaces the log terms in (61 1 with infinite sums and enables easier bounding of ( [M] ). 

Lemma 9: Let u > 1. Let 0 < a, 6 < 1. Let D{x\\y) = x \og{x/y) — y + x denote the relative entropy between two Poisson 
random variables with means x and y. Then, 


D{v — a||t; — b) = {v — a) log 

=i: ‘ 

/>! 


— {v — a) + {v — b) 


vH{l + l) 


V — b ^ 

— b^[a + (a — b)l)) . 


Proof: Case 1: Let v > 1. Let 0 < a, 6 < 1. 

Using the Taylor’s series expansion for — log(l — x) = X]i>i X’ when |a:| < 1, we get 


D{v — a||z; — b) = (v — a) log 


1 — a/' 


— (v — a) + (v — b) 

b‘ 
vH 

I ^ ^ vH 


1 — b/v ^ 

= -(u - a) ^ ^ + (t; - a) ^ ^ + (a - 6) 
Z >1 1>1 


= {-a + b)+Y, ^ (&' - a') + H ^ + (a - b) 

l>2 1>1 

1>1 V ' ^ 


1>1 


= E 

i>i 


n{i + i) 


— b\a+ {a — b)l)^ . 


Case 2: Let v = 1. Let 0 < a, 6 < 1. The same arguments as above holds. 
Case 3: Let v = 1. Let a = 1, 6 < 1. Then, 


E 

i>i 


1 


/(/ + !) 


{i-b\i + {i-b)i)) = Y, 


i>i 

= (1-5) 

= Z2(0||l-5). 


1 5' 

(ITT) “ T 


jf+i' 

ITT 


(63) 

(64) 


(65) 

( 66 ) 

(67) 

( 68 ) 
(69) 


(70) 

(71) 

(72) 
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Fig. 3. Variation of (1 — c*(l + Z — 0.5cZ)) with 1. 

Case 4: Let v = 1. Let a < 1, 6 = 1. Then, both D{v — a||i' — b) and the infinite sum are infinity. 

Case 5: Let v = 1. Let a = 1, & = 1. Then both D(v — ajlu — h) and the infinite sum are zero. ■ 

We now show that Ao. 5 (u) < 0.9 for all u > 1. For this, it suffices to show that for c = 0.9, D(y—V\\v—c)—Q.^D{v\\v—c) < 0 
for all V >1. 


D{v - l||u - c) - 0.5iA(u||u - c) = ^ 

^ vH{l + 1) 
^ vH{l + 1) 

Let us first consider the case when v = 1 and c = 0.9. We then have 


c\l + {1 - c)l) - 0.5d+H) 
c^{1 + 1) + c^+H-0.5c^+H) 
c\l + 1-0.5d) . 


(73) 

(74) 

(75) 


D{v - l||u - c) - 0.5D(u||u - c) = L>(0||0.1) - 0.5D(1||0.1) 

= 0.1-0.5(log(10)-0.9) 

< 0. (76) 


Thus, Ao, 5(1) < 0.9. For u > 1 and c = 0.9, we observe that (1 — c*(l + I — 0.5d)) is initially negative and then becomes 
positive in I (See Figure [^. Thus, there exists M > 1 such that 


(l-c'(l + ( 



y I <M 
y i>M. 


(77) 


Then, for c = 0.9, we have 

D{v - l||u - c) - 0.5Li(u||t; - c) = ^ vHil + 1) ~ ^**'*'^ + 1) “ 

^ E ^Mil +1) + E ^Mil +1) 

\<1<M ^ ' 1>M ^ ’ 

(78) 


z>i ^ ’ 

= ^ (77(011 l-c)-0.5Z7(l||l-c)) 

= ^(77(0||0.1)-0.577(1110.1)) 

< 0 . 


(79) 
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Inequality (78i is obtained by upperbounding 1) the initial negative terms, till I < M, by replacing by a larger , and 2) 
the later non-negative terms, for I > M, by replacing v’' by a smaller . Inequality (79 1 follows from (76 1 . Thus, we have 
shown that Ao. 5 (f) < 0.9 for all v>l. 

We now show the second part of the proof, i.e., X 2 {v) > 0.1 > 0. For this, it suffices to show that for c = 0.1, D{v — l||u — 
c) — 2D(y\\v — c) > 0 for all v >1. For c = 0.1, we have 

D{v - 1||^ - c) - 2D{v\\v - c) = ^ - ^'(1 + (1 - 


/>1 

= E 

/>i 

= E 

/>! 

> 0 , 


1 


v^l{l + 1) 
1 


(l-c'(l + / + c0) 

( 1 -( 0 . 1)'(1 + Z + ( 0 . 1 ) Z )) 


(80) 

( 81 ) 


(82) 


(83) 


where (831 follows as each term inside the summation in (82i is positive. Thus, when 9i < 62 and for all u > 1, we have 
shown that 


0.1 < A 2 (w) < \*{v) < Ao. 5 (v) < 0.9. 


We now consider the case when 9i > 02- Let 


so that 


V = 


01 


v' -I = 


01 - 02 ’ 
02 


and 


0 


01 - 02 ’ 

A01 + (1 — A)02 


01 — 02 01 — 02 
= u' - 1 + A. 


Equation (55 1 can now be written in terms of v' and A as 

{K-2) 


D{v'\\v' -1 + X)- 


(K-l) 


L'(t;'-l||u'-l + A) = 0. 


(84) 

(85) 

( 86 ) 

(87) 

( 88 ) 

(89) 


Let X*{v') be the solution to (89 1 . Recognise that (891 has the same form as in the previous case for 01 < 02, with only the 
multiplicative constant being different. From arguments similar to the ones used in the previous case of 0i <02, we can show 
that 

0.1 < (1 - A''*(u')) < 0.9, 

or equivalently, 0.1 < X*{v') < 0.9. 

Thus, we have shown that A* is bounded away from 0.1 and 0.9 for all 0i and 02. ■ 


Appendix B 


We stated the main properties of the proposed policy ttm in Section IV We prove them in this Appendix. 


A. Proof of Proposition and associated ingredients 

Before we prove Proposition we develop some convergence results for We show that under the non-stopping 

policy ttm, the empirical rate associated with a process converges to the true rate of that process. The results are akin to 
convergence results for independent random variables, but applied to the dependent random variables in our setting with the 
dependency being induced by the policy. 

Proposition 10: Fix K > 3. Let 'F = (z, i?i, R 2 ) be the true configuration. Consider the non stopping policy ttm- As n —?► 00 
the following convergences hold almost surely. 


3 

m 


Ri if j = i, 

i?2 if j ^ 


( 90 ) 
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and 


where 


_ _ i 

n-N^ 


—>■ i?2, 


TDf 

^min 


< liminf 

n—>-oo 


■y^n _ -y^n 

3 

n — iV" 


< limsup 

n—>-oo 


■y^n _ y^n 

i 

n — 


< R'max for all jV *, 


(91) 


(92) 


= (l-CK)min{i?i,i?2} + c_R-max{i?i,i?2} 


(93) 


and 


R'max = Cif min{i?i,ii 2 } + (1 - C/f)max{i?i,i? 2 }, 


(94) 


and cjf is as in Proposition]^ 

Proof: Let J-i-i denote the cr-field generated by Consider the martingale difference sequence 

71 

= Yf - NfR, =J2(Xi - 
1=1 

Given the Poisson assumption on Xi, we have E [{Xi — |< oo for all 1. Then, by the convergence result 

for martingales, see De la Pena ifTSl Theorem 1.2A], for any e > 0, there exists > 0 such that 


P{S^ > ne) < 

which in turn, by the Borel-Cantelli Lemma lfT9l sec 4.2], implies 

Sn 

— -0 almost surely. 


(95) 


(96) 


Similarly arguing, we conclude that convergence result holds for other S'j jn, for j = 1, 2,..., iT. Further, from Proposition 
we have 


Combining (96 1 and (97 1 , we have. 


or equivalently. 


iV" 

lim inf —— > ck > 0 almost surely. 

n—>-oo fl 


s^. 

—>■ 0 almost surely. 


y^n 

— - Ri almost surely. 

j\jn 


(97) 


(98) 


(99) 


Similar result hold for other j, with i?i replaced by i? 2 , and we have established (j9^. Furthermore, these results imply that 

{y--y-)-Y.u^^n-r^ 


Consequently, we get 


Fix j i, we then have 


{n - N^) 

(F” - F”) 

{n-Nf) ^ 

Ek^,N-R, _ Nl 

(n — N^) n — N^‘ 


—>■ 0 almost surely. 


i ?2 almost surely. 




n-m 


Ri. 


( 100 ) 


( 101 ) 


( 102 ) 


We do not yet have a convergence result for Nf/n for any k. Proposition (j^ only says that at every slot and for each process, 
the probability of choosing that process is greater than ck- Thus, we are not in a position to say, as n —> oo, whether 

AFn _ -y'n 

3 

n — 


—> constant. 
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However, from Proposition we get the following bound 

^ . I'D D 1 , ID D 1 J2k^j 

(1 - CK)min{i?i,i? 2 } + Cif max{i?i,i? 2 | < liminf < limsup 

[n — ) n—)-oo (Tl — j ) 

< ck min{i?i, i? 2 } + (1 ~ ck) max{i?i, R 2 ] almost surely. (103) 

Thus, (100 1 combined with ( |103[ ) yields (j9^. ■ 

We now state a lemma that asserts that, under the non-stopping policy ttm, Zi{n), the test statistic associated with the index 
of the odd process, drifts off to inhnity. 

Lemma 11: Fix K > 3. Let ih = (i, i?i,i? 2 ) be the true conhguration. Consider the non-stopping policy ttm- Then for all 
j ^ i, we have 

Zj-iin) 

liminf — - - > 0 almost surely. (104) 

n—>-oo n 

Proof: Without loss of generality assume i?i < i? 2 - Observe that we have i?i < < R'max < ^ 2 - Recall that 

D{x\\y) :=x\og{x/y) -x + y, 


the relative entropy between two Poisson distributions with means x and y. We can write (29 1 as 

r(T;" -f 1) r(r” - f" -p i) 


Zij{n) = log 


{Nf + l)(L"+i) (^n-Nf + l)(i""-L"+i) 

/ / F" \ \ / /y^ -Y^ 


.-Nl' 


- 1 


> Yf' log 


^71 


Nf + 1 
+ (F"-F”)log 


-F" + log(^ 
F" - Y 


\/W' 

Nf + 1 


n — Nf -\-1 


- (F” - YD + log 


' ^27t{Y^-Y f)' 
n — TV” + 1 


/ F” \ Y'^ -Y^\ 

log her - + (^" - yn log —^ - (F” - F,”) 


= {Nr + l)D 




Y'p 


Nf + 1 n — TVJ^ 

. ‘ j 


n — N" 


yn _ yn \ / yn _ yn _ yn yn _ yn 

^ 1 + („ _ iv” - m)D I * J- II 


n — Nf — TV” 

^ J 


( 

- NfD ' ^ 


F” - F" 


yn _ yn 

^^ i 

n — Nf + 1 


Nj- n — TV” + 1 


yn _ yn 

j 

n — TV” 


/ yn _ yn _ yi 

-{n-Nf- TV")D ' * ^ 


-Tvr 


F” - F” 


n — TV” — TV” n — TV” + 1 

^ J ^ . 




TV” + 1 


n — TV" + 1 


(105) 


(106) 

p.54]. 


where the inequality (105 1 follows from the lower bound for the gamma function r(a; + 1) = a;! > x^e~ 
and the equality (106 1 follows from the use of the formula for D{x\\y) and some rearrangement of terms. 

We now study the convergence of each of the terms in ( |106| l. All convergence statements are in the almost sure sense. 
Consider the hrst term in ( |106| l. From Proposition [T0| and Proposition!^ as n —> 00 , we have 

yn _ yr. 


yn 


i?i, liminf 


TV” + 1 


„„„„ and liminf— 

n-)-oo n - TV” n-S-oo n 


TV” + 1 

Consequently, and using the fact that D{x\\y) is monotone increasing in y, for y > x, we have 

F 


> Ck- 


TV” + 1 

liminf ^ D 

n—>-oo Tl 


TV" -f 1 n — N] 


yn _ yn \ 

^ > CkD (i?l||i?Ln) > 0- 


(107) 


( 108 ) 


Similarly, for the the second term in (106 1 , we have 

yn _ yn _ yn yn _ yn ^ _ jyn _ jyn 

n-TV”-TV” ^ ^^ 
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Consequently, and using the fact that D{x\\y) is monotone decreasing in y, for y < x, we have 


n-m - m 

lim inf-^- ^-D 

n—^oo Tl 


yn _ yn _ y^n \ 


Consider the third term in (106 1 . From Proposition [T0| 


3 3 , 

as n —)■ c», we have 

_ yrn 


Yn 

and 


i?2. 


Consequently, 


D 




n — Nr — N" 


n — N'P 


DiR2\\R2)=0. 


( 110 ) 


( 111 ) 


( 112 ) 


Similarly, for the fourth term in (106 1 we get 


Consequently, 


_ Y'^ _ Y'^ 

i j 

n — Nr — Nr 


/ _ Y'^ Y^ 

D i i 3 


i ?2 and 


■y^n _ Y^ 

_ _ ^ i 

n — Nr + 1 


i? 2 . 


yn _ Yr 


n — Nr — Nr n — Nr + 1 


D{R2\\R2) = 0 . 


Consider the hfth and sixth terms in (106 1 . From Proposition 10 we have 


Y'^ _ Y^ 

Consequently, when divided by n and as n —oo, both the terms go to zero, i.e., 

y y^n _ y/^n y Y^ _ Y'^ 

-——^ -)• 0 and lim sup-- 4 - = 0 . 

nn- Nr + I n-i-oo n n- Nr 


(113) 


(114) 


(115) 


(116) 


Consider the seventh and eight terms in (106l. Both the terms go to negative inhnity, but only logarithmically in n, and hence 
when divided by n and as n —> oo, we get 


1 j2Wp\ 1 / 427r(y" - 4”) \ 

- log Y , ^ 0 and - log I ^ ^ ; ’ 1 -> 0 . 


Nr +1 


Thus, we have 

lim inf ^ > lim inf 


(TVn + 1) 


D 


rr 


n 


y^n _ y^n 

I J 


Nr + l n- Nr 


+ 


n — Nr + 1 


{n-N^- m) 


(117) 


y^n _ y^n _ y^n y^n _ y^n 


.-Nr- Nr 


.-Nr 


> Cifi7(i?i||i?4m) + CKD{R2\\R'^ax) 

> 0 . 


(118) 

(119) 


This completes the proof of Lemma 11 


Proof of Proposition ^ We now have the ingredients to prove Proposition The following inequalities hold almost 
surely, 

t{ttm{L)) < T(7r4(L)) 

= inf{n > \\Zi{n) > log((iT — 1)L)} 

< inf{n > l\Zij{n') > log((iC — 1)L) for all n' >n and for all j i} 

< oo, ( 120 ) 


where the last inequality follows from Lemma 11 


While in Proposition 10 we established that under the non-stopping policy ttm Y^/Njr Rk almost surely, the question 
of convergence of Nf" jn to some real constant under the ttm policy remained to be established. We now show that under the 
ttm policy it does converge to a real constant. Furthermore, we show that (L" — Yf)/{n — Nf) also converges to a constant. 
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Proposition 12: Fix K > 3. Let 'll = be the true configuration. Consider the non-stopping policy ttm- Then as 

n ^ oo, the following convergences hold almost surely, 

(i) 


(ii) 

(iii) 

(iv) 

(V) 

(Vi) 



i*{n) —)■ i, 

(121) 



(122) 


^r*(n ),2 ^ 2 , 

(123) 

A*( 


(124) 

m 

^ A*(z, Ri,R2)U) for all j = 1, 2,..., K, 
n 

(125) 

— 

^ ^ i?(A*(i,i?i,i? 2 )(z)) for all j ^ z, 

(126) 


where R is as in (12 1 . 

Proof: From Lemma [m we 


have 

liminf Zi{n) 

n—foc) 


liminf min Zi,(n) > 0 almost surely. 

n—>-oo j^i 


(127) 


Fix j i. Then, the following inequalities hold almost surely. 


lim sup Zj (n) = lim sup min Zjk (n) 

TL—fOO n—foo 

< limsupZji(n) 

n—¥oo 

< lim sup Zij{ n) 

n—¥oo 

< — liminf min 

n—)-oo k^i 

= — liminf Zi{n) 

n—foo 

< 0 . 


It further implies, z*(n) = max^ Zk{n) = i almost surely. This proves (i). 

All convergence statements are in the almost sure sense. From (i) and Proposition [T0| we get 


nn _ 

2 *(n),l — jyn 

i*{n) 




and similarly we get. 


0,, 




(n),2 


*(") 


.-N'. 


i‘{n) 


yzn _ yrn 

_ _ ^ i 

n-Nf 


R2. 


This proves (ii) and (iii). 

From (i), (ii) and (iii) we have 







where we have used that fact that A*(z, x, y) is jointly continuous in (a:, y), a fact that follows from Berge’s Maximum Theorem 

EU. 
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Consider the martingale sequence 
as used in ([9^, we get 


Ym=i •^*(**(’^)) ^r*(n) 1 ’ ^r*(n) 2 )(j)- From (iv) and martingale convergence arguments, 

jyn , n 

—>■ X*{i, i?i, i?2)0)- 


For ease of notation, let A*(j) denote A*(z, i?i, i? 2 )(*)- We can rewrite (F" — y")/(n — as 


■^n _ -y^n 

i 

n — W" 


_|_ -y^n _ y^n _ 


n — iV” 


n 


n-m - m F" - K" - F” 

^ J ^ J 


n — Nr — N" 


n — N” 


Then, from (v) we have the following convergence in almost sure sense. 


yn _ yn 

3 

n — Nr 


A*(t)i?i + (l-A*(t))§E§i?2 


\*{i) + (1 - 


= R{X*{i)). 


This completes the proof of the Proposition. 


B. Proof of Proposition 

We already established ( |106| l. Using Proposition 12 we now recognise that all the fractions converge to their respective 
quantities. Hence, 

{n-Nf - Nf) ^ [Y^-Yf- Yf Y^ - Yf 
n 


Zii (n) 

lim inf — - - > lim inf 


{Nf + 1 ) 


D 


Yf 


I f 


n 


Nf + 1 n — N'T 


+ ' 


-D 


n — Nf — Nr n — N"" 


= {X*{i,Ri,R2){i))D{Ri\\R) + (1 - {X*{i,Ri,R2)m [Z 


(K-l) 


= D*{i, i?i, i? 2 ) almost surely. 


(128) 

(129) 

(130) 


Similarly, by using r(a: + 1) = a:! < x^e ^~^^s/27rx, and following the steps leading to (106i with limsup instead of liminf, 
it can be shown that limsup„_^oo Ri, R 2 ) almost surely. It follows that 


Z (n) 

lim N - = D*{i, Ri, R 2 ) almost surely, 


n—>-oo 77 


which establishes Proposition]^ 


From Proposition we know that the expected stopping time, E [T^nMiL))], grows to inhnity as L —>■ 00 , but we now 
show that grows to inhnity in almost sure sense also. 

Lemma 13: Fix K > 3. Let rp = (i,i?i,i? 2 ) be the true conhguration. Consider the policy ttm{L). Then, 

liminf r(7rM(T)) —>■ 00 almost surely. (131) 

L—^OO 

Proof: It is evident that the sequence of random variables r(7rM(T)), indexed by L, is non-decreasing in L. Hence, it 
suffices to show that, as L —>^ 00 , 


P {t{ttm{L)) < n) —>■ 0 for all n. 


(132) 
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To see this, observe that 

limsupP {t{ttm{L)) < n) = limsupP ( max Zj{l) > log((iT — 1)L) for some j 


L — 


L—^c 


KKn 


K 


< limsup ^^P(Z,(0>log((iT-l)P)) 




i=i 1=1 


< lim sup — Yrc? —TY7T 
L-s-oo log((P:-l)L) 


< limsup^—T ——- 

L-).oo log((Pr-1)L) ^ 


K n 

J2J2E[i+2{Y^r] 

K n 

EE [l + 2Z^(max{Pi, P 2 } + (inax{Pi, P 2 })^)] 

i=i 1=1 


(133) 

(134) 

(135) 


= 0 . 


Inequality (133 1 follows from union bound. In inequality (135 1 we have used the convexity of bound P[(X)fc=i ^k)^] < 
PE[{Xf^)‘^\, and also that for Poisson random variables E[X^] = E[X] + P[X]^. Inequality (134i is obtained by bounding 
Zj{l) as follows: 

f(X\A^\H=j) \ 


Z,{1) = log 


< log 


maxfe^j /(X^ = i) 


f{X^,A^\H = j) 
f{X\Ai\H = k) 


for some k ^ j 


/ y/ \ / F* - 


l-Nl 


- (F' - Yh 


n'log 


n 

K 


-Yi + {Y^-Yl)\og 


yi _ yi^ 

i-K 


- (Y^ - Yl) 


NiD 




\zl \zl 

-^l|l 

l-Ni 


(136) 

(137) 

(138) 

(139) 


(140) 

(141) 


< (f;)^ + (f'-f;) +^- 

< (F/)2 + (F'-F/)2 + / 

< 2 (F ')2 + L 

Inequality ( |136| l follows by upper bounding the numerator in by the maximum likelihood function and lower bounding 
the denominator by choosing the maximum likelihood function with respect to an arbitrary k ^ j instead of the maximiser. 
Inequality ( |139| l follows by recognising that the terms inside square brackets in ( |138|l ca n be written as a sum of relative 
entropy terms mi nus a n 1. Also, we upper bound xlog{x/N) — x hy x'^. Inequality (140i follows by ignoring the negative 
terms. Inequality (141 1 follows by upper bounding Yj and F* — Yj by YK ■ 

In Proposition 1^ we showed that, as n —> 00 and under the non-stopping policy ttm, Zi{n)/n converges to D*{i,Ri,R 2 ). 
We now show that, as L —>• 00 , Zi{T{'KM{L)))/T{'nM{L)) D*{i, Ri, R 2 ). 


Lemma 14: Fix K > 3. Let T' = {i,Ri,R 2 ) be the true conhguration. Consider the policy We then have 

Zi{T{'XM{L))) 


lim 


L—foo T{11 {^L')') 

Proof: It follows from Proposition and Lemma [TJ 


= D*{i, Ri, R 2 ) almost surely. 


(142) 


C. Proof of Proposition 

We now have all the ingredients to prove the main achievability result of Proposition [T] By the dehnition of TilTMiL)), 
have that Z^(r( 7 rM(E)) — 1) < log((i^ — 1)L) at the previous slot. Using this we get 


we 


Z.(r(y (L)) - 1) ^ log({K- m 

L^oo log(P) L->oo logP 


Substituting \IA2) in (143 i, we get 


limsup ifEff = limsup 

L-Poo log(T) i-poo log(T) 

1 

< 


D*{i,Ri,R2)' 


(143) 

(144) 

(145) 
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exp 


< CX). 


A sufficient condition to establish convergence of the expected stopping time is to show that 

log(L) J 

Without loss of generality assume Ri < R 2 , such that i?i < < R'^ax < ^ 2 , where and R'^y^^x defined in 

and ( 0 , respectively. Let e > 0 be an arbitrary constant. Let ck be as in Proposition We then have 


lim sup E 

L—^oo 


lim sup E 

L—^oo 


g log(Z/) 


= lim sup / P 
L—)-oo Jx>0 


r(7rM(-L)) 

log(L) 


> log(a;) ) dx 


< lim sup / P {R{ttm{L)) > [log(a;) log(L)J) dx. 
L—>-oo J x>0 


Let us now define 


( 2>{l + e) \og{{K-l)L) 1 

“ ■ ^^^\CKD{Ri\\R!yyyyyy)\0g{L)^\0g{L) 


For X < u{L) let us upper bound the probability by 1. We then get the right-hand side of (147 1 to be 
limsup f P{P{ttm{L)) > [log(x) log(L)J) dx 

L—>-oo Jx>0 


< limsup 

L—¥00 


{L)+ f P {P{-km{L)) > [log(x)log(L)J) dx 

J x'>u{L) 


(146) 

(147) 

(148) 

(149) 

(150) 


Recognising that P {t'^{'Km{L)) > [log(x) log(L)J) is constant in the interval 

/ n -f 1 \ 


X G 


exp 


iog(L)J’"^p'viog(i)y 


and recognising that the interval length is upper bounded by exp further upper bound (150i by 

limsup f P{P{-km{L)) > [log(x) log(L)J) dx 
L^oo J x>0 

,, .exp(^)p(e(,„(L))>„)<i. (152) 


(151) 


n>[log(«(L)) log(L)J 


< exp 


( 3(1+ e) 

\CKD{Ri\\R'yyy^j J ' 


lim sup 


E 


n>[log(-ii(L)) log(L)J 


/ u-f 1 \ 


P {Zi{n) < log((iL - 1)L)) dx. 

(153) 


To show that the right-hand side of ( |153| l is finite, it suffices to show that for all 

M / /.M 3(l-f e)log((iT-l)L) 

„>ll„g(„(L))l„g(L)J> 

and for sufficiently large L, there exist constants 7 > 0 and 0 < B < 00 such that 

P iZy{n) < log((iT - 1)L)) < Se-T'". (154) 

We now show that such an exponential bound does exist. 

Lemma 15: Fix K > i. Fix L > 1. Let 4' = (i, be the true configuration. Let u{L) be as in (148 1 . Then, there 

exist constants 7 > 0 and Q < B < 00 , independent of L, such that for all n > [u(L) log(L)J, we have 


P {Zy{n) < logiiK - 1)L)) < Be-'^^. 

Proof: The following upper bounds for P {Zi{n) < log((iL — 1)L)) is self evident 

P{Zy(n)<\og{{K-l)L))) 

= P (minZij(n) < log((iT — 1)L) 

\3^i 

- < iog((^- 1 )^)) • 


(155) 
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It now suffices to show that for every j ^ i the probability term in the above expression is exponentially bounded. We upper 
bound Zij{n) in the same way as we earlier did in ( 1061 . 


P{Z,^{n)<\og{{K-l)L)) 


< P (Aff + 1)D 




iV” + 1 n — N’, 


'Y'n _ \ / y^n _ y^n _ y^n y^n _ y^n 

1 -7V")d I -* - II 


/ Y^ — Y‘ 

- N^D I ^ " 


y^n _ y^n 

_ i 

n — iVf + 1 


Y — w” + 


y^n _ y^n 

j 




. n — NJ^ — W" n — 

\ * J J 

' y^n _ vn _ y^n ^rn _ \ 

^ 

n — iV" — Y n — Nf- + 1 y 




log 


iV" + l 


+ log 


' v/27r(y» - y») 
n — + 1 


' < \og{{K - l)L) 


Using union bound, we upper bound ( |156| l by a sum of probability terms as given next. 
P(Z,,(n)<log((X-l)L)) 

< P [(iVr + 1) (d - DiRiWRLj) < -e'n 


+ P (n - iVf - Y)D 


y^n _ y/'n _ y^n y/’n _ y^ri 


n — iV" — N'^ n — N'^ 

^ J J 


< —en 


/ / F” F" - F 

+ P -7V”P' ^ II --* 


Nn "n- m + l 

J ^ , 


< —e n 



n — n — NJ^ + 1 


< —en 


< —en 



+ P ((iVy + 1 )P(Pi||P:„,J - 8 e'n < log((if - 1)P)). 
Let us choose 0 < e" < ckI^, so that 

>CKil + e''). 

r - - e ) 

We then we choose e' > 0 such that 


< —en 


1 - CKil - e") 

3(1 + e)(cK(l - - 8 e') > ckD{R^\\R'^ 


(156) 


(157) 


(158) 


so that 

P ((iVf + 1 )P(Pi||P:„,J - 8 e'n < log((iL - 1)L), (iV" + 1) > c^(l - e")^) = 0 (159) 

for all n under consideration, i.e., for all 

M / 3(1+ e)log((A:- 1)P) 

n > log u L log P > ^ ntp p, / ■ 

ckD[Ri\\R^^J 

The last term in ( |157| i can then be upper bounded by 

P ((iV” + l)P(Pi \\R'^,J - 8e'n < log((iT - 1)P)) 

< P ((iVy + 1 )P(Pi||p:„„) - 8 e'n < log((iT - 1)L), (IV” + 1) > ck(1 - e")n) 

+ P((iVr + l) <c^(l-e")n) 

= 0 + P((Aff+ 1) <c^(l-e")n) (160) 

f^^TT 

<exp(- —). (161) 

Equality (160i follows from |159jl. From Propositionwe recognise that {N^ —ncK) is a bounded difference sub-martingale 
for all /. Hence, inequality (l 6 l|l follows from the Azuma-Hoeffding inequality for bounded difference sub-martingales. Note 
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that only the last term in (157 1 is dependent on L. By the choice of e' and for all n under consideration, and from (1611, we 
have shown that it decays exponentially with n, and independent of L. 

It now suffices to show that each of the other terms in ( |157[ ) decays exponentially with n. Let us now look at the hrst term 
in ( [T57l i. 


P (ivr + 1) 77 






A^” + l" n-m 


-D{Ri\\E!^i^) < -e'n 


< P (iVr + 1) 77 




yn _ yn 


7V" + 1" n-m 

. * J 


- D{R^\\R'^,^) < -e'n, m, >ck{1- e")n V/ 


+ <CK{l-e")n) . 


(162) 


All the terms inside the summation in (162 1 have exponential bounds from Proposition and from Azuma-Hoeffding inequality 
for bounded difference sub-martingales. The hrst term in (|162|l can be further upper bounded by, 


P (TVf +1) P 


yn 


yn _ yn 


AT" + 1 " n- m 


- P(PillPLn) < m >Ck{ 1- e")n V/ 


< P (iVf + 1) P 


yn 


yn _ yn 


TV” -f 1" n- m 

. * J 


/ -Yr 


< —e n, 


m > ck{ 1 - e")n Vj', 




■ -N: 


> Ri 


yn _ yn 


n — min’ 


.-ivr 


< R 2 


+ p 


yn _ yn 

n-m 


<R'^,„, m>CK{l-e")nWf]+P 


yn _ y 


n — N‘, 


jr > P 2 , iV," > ck{ 1 - e")n V/ . (163) 


Inequality (163i follows by replacing P(Pi|jP^j^) by a larger P (^RiWyzr^P-'^ using the fact that D{x\\y) is monotonically 

- \ ' y"—Y" 

increasing in y for y > x. Let us now consider the hrst term in (163l. Recognise that we have restricted —yyP to lie in a 

compact interval [P^j„,P 2 ]- Further, since D{x\\y) is jointly continuous in {x,y) and since the second argument is restricted 
to a compact set, we can upper bound the hrst term in (|163|l, for a suitable 5^, by 


P (iVf -P1) P 


yn 


AT" -pi n — N', 


yn _ yn \ / yn _ yn 

' - P Pill 


n — m 


< —e n, 


m > ck(1 - e")n V/, 


yn _ yn 


-At; 


> Pi 


yn _ yn 


n — -^^rmn') 


- Atp 


< P 2 


< P 


yn 
^ i 

TVf-pl 


-Pi 


> <5,, m > ck{1 - e")n V/ . 


(164) 


We recognise that ( |164| i can be expressed as the probability of the deviation of a martingale difference sequence from zero, 
which we know can be exponentially bounded using the martingale concentration bounds of De la Pena ifTSl Theorem L2A], 


given in ( [95| ) 
PI 


Let us dehne R'^^^ •= P'mm + CKf^"{R 2 — Pi) and := R'max ~ CKf^"{R 2 — Pi)- Let e"' > 0 be such that 

" -P 2e"' < R'max- We then recognise that, given the event {m, > c/f (1 — e")n Vj'}, the event 


min 


....C R" ■= »' 

^min ' 

2e"' < R'Y^ and R 


max 


N^Ri + {n-N^-N^)R2 

n — 


^ (1 — cx(l “t" + c/c(l + c")R2 — R 


// 

min 


is also true. Then, the following statements are true 


yn _ yn 

n — 


< Pi 


'mzn f — 


yn _ yn 

_li_ ^ td" _ 

— jV” ^ ^rnin ^ 


c 


c 


■yn-yn jyn- m)R2 


n — 


< 


n — 


— e 


yn _ yn ]yn _ ^yn _ yn)^^ 


n — TV” 


.-m- 


> e" 


(165) 
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Similarly, given the event {iV" > (1 — e")n V/}, we can show that 

N^^Ri + {n-N^- N:^)R 2 


Yn _ 




> R 


2 ( C 


■y^n _ -y^n 




n — iV" 


> e" 


(166) 


From (|165|l and (|166|l, the second and third term in (|163[) can then be upper bounded by 


yn _ yn 

n-m 


< 2P 


m>CK{l-e")n\ff]+P 




n-Nf 


> i? 2 , m > ck{ 1 - e")n Vj' 


( 

j 

NJ^Ri -f (n - Vf - Ny)R 2 

1 

n-N^ 

1 

e 


> e" 


Nf^>CK{l- e”)n Vj' 


(167) 


Again, we recognise that ( |167[ ) can be expressed as the probability of the deviation of a martingale difference sequence from 
zero, which we know can be exponentially bounded using the martingale concentration bounds of De la Pena mi Theorem 


1.2A], given in (95 1 . 


Let us now look at the other terms in (157i. The second term is identically zero, as the left-hand side is always positive. 


Arguments similar to those of the hrst term hold for the third and fourth terms. For the hfth and sixth terms, the left-hand sides 
converge to a constant, while the right-hand side goes to negative inhnity, and thus its straightforward to obtain exponential 
bounds for these terms. Similarly, for the seventh and eight terms, the left-hand side goes to negative inhnity at a logarithmic 
rate, while the right-hand side goes to negative inhnity at a faster linear rate, and again it is straightforward to obtain exponential 
bounds for these terms. This completes the proof for Lemma [T^ ■ 

This completes the proof of our main achievability result of Proposition |7] ■ 
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